CSC 453

Winter 2026 Day 4

Admin

  • Quiz 2 due Friday
  • Lab 2 due Monday
  • SLOsh due Monday
  • No class/lab Tuesday

Process management

Questions to consider

  • Which system calls are related to process management and lifecycles?
  • How does the process hierarchy work?
  • What are zombies and orphans? Why do zombies exist?

UNIX process APIs

  • fork() creates a new child process
    • All processes are created by forking from a parent
    • The init process is ancestor of all processes
      • Run pstree in a terminal to see
  • exec() makes a process execute a given executable (effectively replaces the process)
  • exit() terminates a process
  • wait() causes a parent to block until child terminates
  • Many variants exist of the above system calls with different arguments

What happens during a fork()?

  • A new process is created by making a copy of parent’s memory image
  • Both parent and child have unique address spaces (isolated from each other, allowing for independent processing)
  • The new process is added to the OS process list and scheduled
  • Parent and child start execution just after fork (with different return values)
  • Parent and child execute and modify the memory data independently

Process management

Process creation

  • Different execution models
    • Parent & child may execute independently
    • Parent may wait for child
    • Child may create more children (Process hierarchies)
    • Parent may kill children
  • Child often invokes exec() to change its memory image to a new program
  • Why two steps (fork() then exec())?
    • Allows the child to change file descriptors and other settings before exec()

Process destruction

  • Some operating systems do not allow child to exist if its parent has terminated. If a process terminates, then all its children must also be terminated

    • Cascading termination: All children, grandchildren, etc., are terminated.
    • The termination is initiated by the operating system
  • The parent process may wait for termination of a child process by using the wait() system call. The call returns status information and the pid of the terminated process

    pid = wait(&status); 

Zombies and orphans

  • If no parent waiting (did not invoke wait()), and process completes, process is a zombie
    • Zombie = dead but not yet reaped (exit status hasn’t been read)
    • Still has an entry in the process table
    • We need zombies: so the kernel can preserve a child’s exit status until the parent calls wait(), even if the child exits first
  • If parent terminated without invoking wait(), process is an orphan
    • Orphan = alive but parent is gone
    • init benevolently adopts orphans

What isn’t clear?

Comments? Thoughts?

Modern process isolation

Questions to consider

  • What are namespaces and cgroups?
  • How do they differ from virtual machines?
  • How do containers use them to isolate processes?

Isolation without full virtualization

  • Virtual machines provide complete isolation by emulating entire hardware + OS
    • Heavier resource overhead (multiple OS instances)
    • Better isolation between workloads
  • Containers provide lightweight isolation using OS-level mechanisms
    • Share the same kernel
    • Much lower overhead than VMs
    • Linux kernel provides the building blocks: namespaces and cgroups

Namespaces: logical isolation

  • Namespaces partition global system resources so they appear as separate isolated instances
  • Each process belongs to a namespace and only sees resources in that namespace
  • Types of namespaces:
    • PID: process IDs (what processes can a process see?)
    • Network: network interfaces, ports, routing tables
    • Mount: filesystem mounts (what can a process access?)
    • IPC: IPC objects, message queues
    • UTS: hostname and domain name
    • User: user and group IDs (who owns the process?)

Namespaces example

  • Two processes in separate PID namespaces think they are PID 1 (init)
  • Each sees only processes within their own namespace
  • From the host OS perspective, they have different global PIDs
  • Enables the illusion that each container has its own isolated process tree

Seeing namespaces in action

  • View namespaces a process belongs to:

    ls -l /proc/self/ns/
  • Use unshare to create a new PID namespace:

    unshare -pf --mount-proc bash      # creates new PID namespace, your shell is PID 1
    ps aux               # only sees processes in this namespace
    exit                 # back to host namespace
    ps aux               # will show all processes on the host
  • Compare namespace inodes before and after (same inode = same namespace):

    ls -i /proc/self/ns/pid
    unshare -pf --mount-proc bash -c 'ls -i /proc/self/ns/pid'  # different inode

Cgroups: resource limits

  • cgroups (control groups) limit, prioritize, and account for resource usage of process groups
  • Key capabilities:
    • CPU limits: restrict how much CPU time a group can use
    • Memory limits: cap memory usage; OOM killer invoked if exceeded
    • I/O limits: restrict disk I/O bandwidth
    • Device access: restrict which devices a process can access
  • All processes in a cgroup share the same resource limitations

Seeing cgroups in action

  • View what cgroup a process belongs to:

    cat /proc/self/cgroup
  • Check your current limits:

    cat /proc/self/limits  # shows per-process limits (some enforced by cgroups)
  • In practice, cgroups are invisible to users, kernel enforces limits automatically when a process exceeds allocated resources

Cgroups vs. namespaces

  • Namespaces: about visibility—what can a process see?
    • Logical isolation of resources
  • cgroups: about limits—how much can a process use?
    • Resource accounting and enforcement
  • Together: processes appear isolated AND are prevented from consuming excessive resources

Containers as a building block

  • cgroups and namespaces are mechanisms, containers allow us to apply policy through orchestration
  • Containers (e.g. Docker) combine namespaces + cgroups + layered filesystems
  • Results in lightweight, portable process isolation
  • Single kernel, multiple isolated environments
  • Much cheaper than VMs, but with less isolation guarantees

What isn’t clear?

Comments? Thoughts?

Process communication

Questions to consider

  • What are the two main strategies for IPC?
  • How do they differ?
  • In what situations would you choose one over the other?

Processes give us a protection boundary

  • The operating system is responsible for isolating processes from each other
  • What you do in your own process is your own business but it shouldn’t be able to crash the machine or affect other processes, or at least processes started by other users
  • Thus: safe intra-process communication is your problem; safe inter-process communication is an operating system problem

Why do we need IPC? What are the benefits?

  • Data Sharing: IPC allows processes to share data efficiently, which is crucial for applications requiring real-time data exchange
  • Modularity: It promotes modularity by enabling different parts of a system to communicate, making the system easier to manage and scale
  • Resource Utilization: IPC can help optimize resource utilization by allowing processes to coordinate their use of shared resources
  • Concurrency (scalability): It supports concurrent execution of processes, improving the overall performance and responsiveness of applications

What are the disadvantages of IPC?

  • Complexity: Implementing IPC can add complexity to the system, requiring careful design and management to avoid issues like deadlocks and race conditions
  • Overhead: IPC mechanisms can introduce overhead, potentially impacting performance, especially if the communication is frequent or involves large amounts of data
  • Security: Ensuring secure IPC can be challenging, as it involves protecting data from unauthorized access and ensuring the integrity of the communication
  • Debugging: Debugging IPC-related issues can be difficult, as problems may arise from interactions between multiple processes, making them harder to isolate and resolve

What are the two main categories of IPC?

  • Message passing
    • High-level abstraction for exchanging packets of information over some interconnect
  • Shared memory
    • Region of memory available to different processes; writable by at least one process

Message passing

  • Kernel establishes and oversees all communication
    • Process copies data to buffer, then issue system call to request transfer
    • Kernel copies data into its memory
    • Later, process issues system call to retrieve
  • Two primitives: send() and recv()
  • Beyond intra-computer communication, facilitates processes over a network; link implementation is unimportant

Pros and cons of message passing?

  • Pros:
    • Easier to implement and manage, especially in distributed systems
    • Provides clear boundaries between processes, enhancing security and modularity
  • Cons:
    • Can introduce overhead due to the need for message formatting and transmission
    • May be slower compared to shared memory for large volumes of data

Shared memory

  • Kernel plays a role in establishing and attaching the address space, but does not control read/write access beyond that
  • How the memory is shared, and kept consistent, is left up to the processes

Pros and cons of shared memory?

  • Pros:
    • Offers high-speed data exchange, as processes can directly read and write to the shared memory
    • Efficient for large volumes of data
  • Cons:
    • Requires careful synchronization to avoid conflicts and ensure data consistency
    • Can be more complex to implement and debug

Message passing vs. shared memory

  • Which do you choose?
    • If you have few messages?
    • If you have millions?
    • If you need to communicate across systems?
    • If you need in-order delivery but don’t want to code it yourself?
  • Considerations:
    • Cost to establish
    • Cost per message

“Gemini, make an image in the style of a video game pitting pipes versus shared memory”

What isn’t clear?

Comments? Thoughts?

IPC mechanisms

Questions to consider

  • How do pipes differ from FIFOs and when would you use each?
  • What considerations matter when choosing between pipes, FIFOs, and memory-mapped files?

IPC taxonomy

Pipes (anonymous pipes)

  • Pipes are unidirectional; one end must be designated as the reading end and the other as the writing end
  • Pipes are order preserving; all data read from the receiving end of the pipe will match the order in which it was written into the pipe
  • Pipes have a limited capacity and they use blocking I/O; if a pipe is full, any additional writes to the pipe will block the process until some of the data has been read
  • Pipes send data as unstructured byte streams. There are no pre-defined characteristics to the data exchanged, such as a predictable message length

Pipes (cont’d)

  • Pipes create a producer-consumer buffer between two processes
  • Cannot be accessed outside of the creating process (child inherits the pipe since it’s a fd)
  • The operating system manages a queue for each pipe to accommodate different input and output rates
  • Facilitates the canonical chaining together of small UNIX utilities to do more sophisticated processing

Named pipes (also known as FIFOs)

  • Of course there is another type that don’t share the same characteristics
  • Named pipes are bidirectional, don’t have the parent-child relationship
    • Creates a persistent file-like name
    • Require a reader and writer

Named pipes (cont’d)

  • Really a data stream (ordering, buffering, reliability, authentication all implied)
  • They are NOT regular files, though they look like them
    • Once data has been read from a FIFO, the data is discarded and cannot be read again
    • Cannot broadcast: only one read()
    • NOT wise to use bidirectionally, why?

Shared memory with mmap

  • Memory-mapped files allow for multiple processes to share read-only access to a common file. Example: libc.so mapped into all running C programs
  • Memory-mapped files bypass the kernel’s buffer cache (as done with a normal read()), and the data is copied directly into the user-mode portion of memory

  • Provide extremely fast IPC data exchange. i.e., when one process writes to the region, that data is immediately accessible by the other process without having to invoke a system call
  • Unlike pipes, memory-mapped files create persistent IPC. Once the data is written to the shared region, it can be repeatedly accessed by other processes

Memory mapped files: POSIX and System V

  • These two ultimately mmap a file, but do some trickery beforehand and use special files
  • POSIX == named, RAM-backed file descriptors
    • name → fd → mmap
    • Object survives until the last reference is gone
    • usually backed by tmpfs (/dev/shm)
  • System V shared memory
    • Predates POSIX - designed before file descriptors were a universal abstraction
    • Kernel-persistent objects, you must clean up manually
      • Process exits ≠ memory freed

POSIX vs. System V

  • POSIX is great, right!?
    • POSIX (e.g., shm_open()) IPC may not work in your favor in macOS
    • Some evil engineer decided to provide the header files for certain things, but they are empty stubs
      • It will compile, but could be bad
  • System V is “fine”
    • shmget(): allocate
    • shmat(): attach
    • shmdt(): detach
    • shmctl(): control (destroy with IPC_RMID)

“Other” IPC: signals

  • Signals are a limited form of asynchronous communication between processes
  • Processes can register a signal handler to run when a signal is received
  • Users can send signals to processes owned by them; the super-user can send a signal to any process
  • Processes can ignore many signals
    • SIGKILL is a notable exception; used for non-graceful termination
    • SIGTERM is used for graceful shutdown

“Other” IPC: return codes

  • Simplest, most limited form of IPC
  • Allows processes to return a single int to the process that created them
  • 0 typically indicates success; non-0, failure.
  • Analogous to older computers that would transform a set of punch cards into a “result.”
    • bash exposes return codes as $?
    • Can print the return code for the last process: echo $?

What isn’t clear?

Comments? Thoughts?